The sank of RMS Titanic in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her voyage from Southamoton, South East England to New York City. Estimated 2,224 passengers and crew on the boat, and more than 1500 died.
The dataset used in this analysis has only 1309 rows of passengers’ information. Therefore, any results from this analysis should be treated as estimate.
1309
843
466
102
105
1054
48
Reference
Dave Langer 2017, Intro to Machine Learning with R & caret, Data Science Dojo, Viewed 22 October 2021, https://www.youtube.com/watch?v=z8PRU46I3NY&t=1492s
Kaggle 2021, Titanic - Machine Learning from Disaster, viewed 22 October 2021, https://www.kaggle.com/c/titanic/data?select=gender_submission.csv
“Untergang der Titanic”, By Willy Stöwer - Magazine Die Gartenlaube, en:Die Gartenlaube and de:Die Gartenlaube, Public Domain, https://commons.wikimedia.org/w/index.php?curid=97646
---
title: "Titanic Analysis"
output:
flexdashboard::flex_dashboard:
orientation: column
vertical_layout: fill
storyboard: true
social: ["linkedin", "twitter", "facebook", "pinterest", "menu"]
source_code: embed
theme: readable
---
```{r setup, include=FALSE}
# R Libraries
library(flexdashboard)
library(tidyverse)
library(skimr)
library(caret)
library(DT)
library(plotly)
```
```{r}
# Data import
#train <- read.csv("train.csv")
#test <- read.csv("test.csv")
##### Combine datasets
#train <- train %>%
# relocate(Survived, .after = Embarked) %>%
# mutate(source = "train")
#test <- test %>%
# mutate(source = "test")
#titanic <- full_join(train, test)
##### Data Cleaning
#titanic_c <- titanic %>%
# dplyr::select(-PassengerId, -Name, -Ticket, -Cabin) %>%
# mutate_if(is.character, as.factor) %>%
# mutate(Pclass = as.factor(Pclass),
# Survived = as.factor(Survived),
# family_size = SibSp + Parch + 1) %>%
# relocate(family_size, .after = Parch)
##### Fill up missing values in Fare
#titanic_c <- titanic_c %>%
# mutate(Fare = replace_na(Fare, median(titanic_c$Fare, na.rm = T)))
##### Fill up missing values in Embarked with most frequently occur levels
#titanic_c$Embarked[titanic_c$Embarked == ""] <- "S"
##### Replace NA in Age with imputation model
#dummy_formula <- dummyVars(~., data = titanic_c[, -9])
#titanic_c_dummy <- dummy_formula %>% predict(titanic_c[, -9])
##### Impute with Bagged tree models
#BagImpute_formula <- titanic_c_dummy %>% preProcess(method = "bagImpute")
#imputed.data <- BagImpute_formula %>% predict(titanic_c_dummy)
##### Extra Age from the dummy
#titanic_c$Age <- imputed.data[, 6]
#write.csv(titanic, "titanic_full.csv")
```
```{r}
# Data import
titanic <- read.csv("titanic_full.csv")
titanic_c <- read.csv("titanic_c.csv")
# Data cleaning
titanic <- titanic %>% dplyr::select(-X) # Remove the row number variable "X"
titanic_c <- titanic_c %>%
dplyr::select(-X) %>% # Remove the row number variable "X"
mutate(Age = round(Age),
Pclass = as.factor(Pclass),
Survived = as.factor(Survived)) %>%
mutate_if(is.character, as.factor)
```
Interactive Visualisation
===========================
Column 1 {data-width=400}
---------------------------
{width=80%}
### Info
The sank of RMS Titanic in the North Atlantic Ocean on 15 April 1912, after striking an iceberg during her voyage from Southamoton, South East England to New York City. Estimated 2,224 passengers and crew on the boat, and more than 1500 died.
The dataset used in this analysis has only 1309 rows of passengers' information. Therefore, any results from this analysis should be treated as estimate.
### Death Statistics (Wikipedia)
```{r}
gauge(1500,
min = 0,
max = 2224,
gaugeSectors(colors = "red"))
```
Column 2 {data-width=100}
---------------------------
### Passenger Count
```{r}
passenger_count <- count(titanic_c)
valueBox(passenger_count, icon = "fa-users")
```
### Number of Males
```{r}
male <- titanic_c %>% filter(Sex == "male") %>% count()
valueBox(male, icon = "fa-mars", color = "grey")
```
### Number of Females
```{r}
female <- titanic_c %>% filter(Sex == "female") %>% count()
valueBox(female, icon = "fa-venus", color = "grey")
```
### Kids < 12
```{r}
titanic_c <- titanic_c %>%
mutate(age_group = case_when(Age < 12 ~ "kid",
Age > 12 & Age < 19 ~ "teen",
Age > 19 & Age < 65 ~ "adult",
TRUE ~ "elder"),
age_group = factor(age_group, levels = c("kid", "teen", "adult", "elder")))
kid <- titanic_c %>% filter(age_group == "kid") %>% count()
valueBox(kid, color = "orange")
```
### teen 12 - 19
```{r}
teen <- titanic_c %>% filter(age_group == "teen") %>% count()
valueBox(teen, color = "orange")
```
### adult 19 - 65
```{r}
adult <- titanic_c %>% filter(age_group == "adult") %>% count()
valueBox(adult, color = "orange")
```
### elder > 65
```{r}
elder <- titanic_c %>% filter(age_group == "elder") %>% count()
valueBox(elder, color = "orange")
```
Column 3 {data-width=500}
----------------------------
### Ticket Classes
```{r}
# set up df
tc <- titanic_c
tc_class <- tc %>%
group_by(Pclass) %>%
summarise(count = n())
# plot
p1 <- ggplot(tc_class, aes(x = Pclass, y = count, fill = Pclass)) +
geom_bar(stat = "identity") +
theme_bw() +
theme(plot.title = element_text(face = "bold"),
legend.position = "none") +
labs(x = "Ticket class",
y = "Passenger count")
ggplotly(p1)
```
### Ticket Prices
```{r}
plot_tp <- ggplot(tc, aes(x = Pclass, y = Fare, colour = Pclass)) +
geom_boxplot(outlier.shape = NA) +
facet_wrap(~ age_group, ncol = 4) +
theme_bw() +
stat_summary(fun = "mean", geom = "point", size = 5, shape = 4, color = "black") +
theme(legend.position = "none",
plot.title = element_text(face = "bold"))+
labs(x = "Ticket Classes",
y = "Ticket Fare")
ggplotly(plot_tp)
```
### Chart 4
```{r}
```
Data Table
=========================
```{r}
datatable(titanic, options = list(pageLength = 50))
```
About
=========================
*Reference*
Dave Langer 2017, *Intro to Machine Learning with R & caret*, Data Science Dojo, Viewed 22 October 2021, https://www.youtube.com/watch?v=z8PRU46I3NY&t=1492s
Kaggle 2021, *Titanic - Machine Learning from Disaster*, viewed 22 October 2021, https://www.kaggle.com/c/titanic/data?select=gender_submission.csv
"Untergang der Titanic", By Willy Stöwer - Magazine Die Gartenlaube, en:Die Gartenlaube and de:Die Gartenlaube, Public Domain, https://commons.wikimedia.org/w/index.php?curid=97646